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ABSTRACT 

Estimation of regional distributions of the qualified military available (QMA) 
population is essential for determining an efficient allocation of recruiting resources. 
Estimates of regional mental ability distribution are required in order to estimate 
QMA. Using data from the Youth National Longitudinal Survey (NLSY), logit 
regression equations are used to estimate the probability that a 17 to 21 year old 
high school graduate will score above the 50th percentile on the Armed Forces 
Qualification Test (AFQT). This probability is modeled as a function of 
sociodemographic variables including gender, race/ethnicity, parents’ education, 
poverty status, income, residence in an urban area and receipt of welfare payments. 
Best fit equations are developed in order to facilitate calculation of nationwide 


county-level AFQT distributions. 
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I. INTRODUCTION 


Manpower analysts and demographers continue to forecast a decline in the 
qualified military available (QMA) population until the mid 1990’s. Nationwide, 
QMA population estimates of two million in 1990 are projected to decline to 1.8 
million by 1995 at which time QMA population will start to increase and reach the 
two million mark again by the year 2000.1 Depending on the magnitude of changes 
in military labor demand in response to proposed force reductions, a smaller QMA 
population could create significant upward pressure on the cost of recruiting quality 
enlistees. Coincidentally, there is continued pressure from the Congress to reduce 
military spending. Combined, these two situations reinforce the need for efficient 
recruiting operations. Central to high efficiency recruiting operations is the 
existence of accurate measures of regional recruiting market potential that provide 
the information necessary to allocate recruiting resources efficiently. 

Econometric techniques developed thus far for estimating QMA involve a 
series of steps in which unqualified segments of the youth population are dropped. 
The remaining youth population constitutes QMA. Those dropped generally include 
individuals who: (1) fall outside an established age range (generally 17 to 21 years 
of age); (2) have not graduated from high school or attained a GED; (3) score 
poorly (generally defined as below the 50th percentile) on the Armed Forces 
Qualification Test (AFQT) or (4) fail to meet moral or medical qualifications for 
military service. This thesis focuses on the estimation of the proportion of the 
youth population in a given area that would score above the 50th percentile on the 
AFQT. Specifically, its primary purpose is to develop regression equations that 
accurately forecast these proportions so that county-level estimates of QMA can be 


developed nationwide. 


1See Thomas (1990); page numbers have yet to be established for this draft document. 
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The research question to be answered is "what factors and individual 
characteristics, for which data are available nationwide at the county level, ’best’ 
predict AFQT score?" Subsidiary questions include: (1) what model specification 
is most appropriate; (2) what variables are candidates for use in AFQT score 
estimating equations and (3) what are the specific equations that best predict 
whether an individual will score above the 50th percentile on the AFOT. 

Since the primary purpose for developing these estimating equations is to 
forecast AFQT score distributions for each county, determining the independent 
effects of individual explanatory variables is of secondary importance. Therefore, 
the often encountered problem of multicollinearity in causal modeling will be of less 
concern. 

This thesis makes no attempt to discuss or explain causative factors that 
determine mental ability. It attempts to identify those personal and socioeconomic 
variables that are statistically associated with AFQT score so that accurate forecasts 
of AFQT score distributions can be made for regional population subgroups. In this 
regard, this thesis borrows from the differential and developmental psychology 
literature inasmuch as this literature identifies variables statistically associated with 
mental ability. 

The primary limitation of this model development effort is that only those 
variables for which data are available nationwide at the county level can be used. 
Once equations are estimated based on individual level data, county averages of the 
explanatory variables will be used to compute the estimated AFQT score 
distributions for all U.S. counties.2 Therefore, the included variables must be 
supported by data collected by such agencies as the Census Bureau, Bureau of 


Labor Statistics or the Department of Commerce. 


The actual nationwide computation of these county distributions will be the subject 
of follow-on work and as such are not presented in this thesis. 
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The forecasting models are developed using individual level data from the 
Youth National Longitudinal Survey (NLSY). The NLSY, which contains 12,686 
observations as of its first year in 1979, includes information on respondents’ work, 
education, economic and family background histories. As part of the NLSY data 
collection process in 1980, the Armed Services Vocational Aptitude Battery 
(ASVAB) was administered to those in the survey in order to establish new test 
norms for the ASVAB.? 

The basic model used is borrowed in large part from that of Goldberg and 
Goldberg (1989). This work, however, expands the number of explanatory variables 
and explores interaction effects among the variables in an attempt to improve 
predictive ability. Also, whereas previous efforts to model AFQT score distributions 
were conducted for high school graduates or the equivalent, this effort models 
AFOT score distributions for high school graduates with diplomas only (i.e., no 


GED’s or other equivalencies). 


3The ASVAB test scores and other data collected in the 1980 administration of the 
NLSY are commonly referred to as "The Profile of American Youth" or simply "The 
PROFILE". AFQT percentile scores, which are referred to throughout this thesis as simply 
AFOQT scores, are calculated using raw scores from four ASVAB subtests. Appendix A 
contains a summary of the procedures for converting raw ASVAB subtest scores into 
AFOT scores. 








II. LITERATURE REVIEW 


The advent of psychological testing and the debate over what factors 
determine an individual’s mental ability started in the latter part of the 19th century 
with the work of the British scholar, Sir Francis Galton. Galton believed that 
genetics determined mental ability and in his well known book entitled Heredity 
Genius (1869), he concluded that success ran in families because great intelligence 
was passed from generation to generation through genetic inheritance. As discussed 
in Weiten (1989), Galton’s convictions regarding genetics were so strong that he 
advocated eugenics programs to improve the quality of the human race. 

In 1905, Alfred Binet devised a test of mental ability at the request of a 
French education commission. The purpose of the test (Binet-Simon) was to 
identify those children with special educational needs so they could be afforded 
special training. The underlying theory for developing such a testing system was in 
contrast to that of Galton, for it suggested that the mental ability of subnormal 
children could improve with environmental changes. In the words of Alfred Binet, 
"The intelligence of anyone is susceptible of development. With practice, 
enthusiasm, and especially with method one can succeed in increasing one’s 
attention, memory, judgement, and in becoming literally more intelligent than one 
was before." This set the stage for the long-standing debate on the determinants 
of mental abilit, 

In 1921 Lewis Termin started a study in which 1528 highly gifted students (IQ 
of about 150) were tracked throughout their lifetime. These students were reported 
to be: (1) above average in height, weight, strength and physical health; (2) superior 
in emotional adjustment and mental health and (3) socially adept and well liked.‘ 


In other studies researchers have found very strong correlations between the mental 


4For a further discussion of Termin’s work, see Weiten (1989). 


4 














abilities of identical twins. Some argue that this correlation results from genetic 
similarity while others argue that such twins generally develop under similar 
environmental conditions. In studies of adopted children, the nurture argument 
appears to dominate. Adopted children with no biological family relationship to the 
parents show intelligence levels similar to the parents. In other studies, children in 
understaffed orphanages and disadvantaged homes, once removed and placed in an 
improved setting, show marked increases in mental ability. 

Some theorists advocate a reaction range model. In this model one’s genetic 
make-up determines the range of one’s mental ability while environmental factors 
determine one’s actual mental ability within that genetically established range. 
While this theory is intuitively appealing it is difficult to test since it is difficult to 
measure the limits of one’s genetically established range. Measurements of actual 
mental ability are, of course, much easier to obtain. 

As the 20th century draws to a close, the debate no longer appears to be 
about which factor, nature or nurture, determines mental ability but rather on their 
relative importance. According to Weiten (1989), the positions in this debate range 
from those who argue that 80 percent of mental ability is determined by genetics 
to those who argue that this percentage is only about 40. For the interested reader 
there is a considerable amount of literature on this issue. Classic works include 
Differential Psychology: Individual and Group Differences in Behavior (1958) by Anne 
Anastasi, The Psychology of Human Differences (1965) by Leona Tyler and Bias in 
Mental Testing (1980) by Arthur R. Jensen. In Educability and Group Differences 
(1973), Jensen strongly challenges the nurture position and argues that heredity 
predominantly determines mental ability. For those choosing to pursue additional 
reading in this area, Eitelberg (1981) provides an excellent historical review and 
annotated bibliography on differences in population subgroup performance on tests 


of mental ability. 





Throughout the nature versus nurture debate some general agreement on 
sociodemographic correlates did emerge. These include gender, race, age, 
educational attainment, geographic region and socioeconomic status as measured by 
father or mother’s education level, poverty status or occupation. In the following 
studies the relationships between these correlates and performance on the ASVAB 
and the SAT are investigated. The first two studies, both of which relate to the 
ASVAB, are based on data from the PROFILE. The third study investigates SAT 
performance. 

Bock and Moore (1984) started with a detailed review of the development and 
survey processes for the PROFILE and describe the psychometric properties of the 
ASVAB. They do a detailed analysis of the relationship between the socioeconomic 
characteristics (including interactions among these characteristics) of the sample and 
performance on the individual subtests of the ASVAB. They also discuss theories 
offered by behavioral scientists in order to provide insight into causal relationships 
associated with their findings. Because the authors analyze these socioeconomic 
variables with respect to performance on the individual subtests of the ASVAB 
and not with respect to AFQT score, the usefulness of their findings is somewhat 
limited for the purpose of developing models to predict AFQT score. In general, 
however, the findings presented do provide interesting insights into potential 
relationships between participant characteristics and their AFQT scores. Variation 
in ASVAB performance was found with respect to gender, race/ethnicity, geographic 
region, poverty status, age, educational attainment and mother’s education. 

The effects of gender were highly dependent on the subtest in question; 
females performed better on paragraph comprehension while males performed 
better on arithmetic reasoning and math knowledge. There were no appreciable 
differences between males and females on word knowledge. The effects of 


racial/ethnic background varied by subtest, but in general, whites significantly 
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outperformed blacks and Hispanics and Hispanics generally outperformed blacks.° 
Within the Hispanic group, Cubans performed better followed by Mexican- 
Americans and Puerto-Ricans. Geographic region at age 14 was also found to be 
related to general performance on the ASVAB. Consistent with other studies, 
individuals in the Northeast generally performed above the average while those 
from the Southeast performed below the average. These results did vary by 
racial/ethnic group. In particular, Hispanics in the Southeast and Midwest 
outperformed Hispanics from the West and Northeast. The authors credited part 
of the geographical differences in Hispanic scores to the social origins of the 
Hispanics living in those regions. Puerto-Rican Hispanics predominant in the 
Northeast, Cuban Hispanics in the Southeast and Mexican-American Hispanics in 
the West. 

Economic status was characterized as either poor or not poor and was 
established in terms of the 1979 Office of Management and Budget (OMB) 
definition of poverty. Nonfarm families were defined as poor if the family income 
was less than or equal to $3770 plus $1230 times one less than the number of 
people in the family. For farm families these amounts were $3220 and $1040 
respectively. In each racial/ethnic group, poor individuals scored lower than 
individuals classified as not poor. 

In addressing the effects of age and educational attainment on ASVAB 
performance, the authors acknowledged that the independent effects of these 
variables were difficult to separate. They concluded, however, that ASVAB 
performance improves with educational attainment and that performance on some 
subtests improves with age while on others it declines. In general, performance on 


"school intensive subjects” declines with age, while performance on practical subjects 


5The "white" racial/ethnic group includes individuals who are classified as "other" (i.e. 
not black or Hispanic). 








improves. Generally the authors found that with respect to age and educational 
attainment, "test performance tends to be typical of the highest grade completed.” 

The final socioeconomic variable investigated was mother’s education. 
Regarding this variable the authors stated, "increasing level of mother’s education 

. . is directly and strongly related to higher scores on all tests in the ASVAB 
battery." This finding is consistent with findings presented throughout the 
psychological testing literature and is generally attributed to the mother’s 
predominate role in a child’s formative years. Also, mother’s education is strongly 
correlated (positively) with measures of economic status which generally means 
greater opportunity for vocational and educational attainment for the child. 

Whereas the work of Bock and Moore (1984) focused on the effects of various 
socioeconomic characteristics on individual ASVAB subtest performance, Profile of 
American Youth: 1980 Nationwide Administration of the Armed Services Vocational 
Aptitude Battery \ooks specifically at the effects of socioeconomic variables on AFQT 
scores.° The results differ little from those presented by Bock and Moore (1984); 
however, because the AFQT score is the dependent variable, the findings apply 
directly to AFQT score forecasting. 

Mean AFOT score increased with age: 46 for ages 18 and 19, 50 for ages 20 
and 21, and 54 for ages 22 and 23. Overall, males had a slightly higher mean score 
than females, 50.8 versus 49.5; however, this varied by age group. Males in the 
18 and 19 year age group had a mean score of 45 versus 46 for females. In the 20 
to 21 year age group males had a mean score of 50 whereas females were one 


point behind at 49. The largest difference was in the 22 to 23 year age group with 


®As in Bock and Moore (1984), this report uses data from the PROFILE. The study 
investigated the same variables as Bock and Moore (1984) with the exception of poverty 
Status. 





males averaging four points more than females; 56 versus 52. Mean AFQT score 
also increased with age in each racial/ethnic group. 

With respect to racial/ethnic group, whites had a mean score of 56 while 
Hispanics and blacks were considerably lower with means of 31 and 24 respectively.’ 
White and Hispanic males scored slightly higher than their female counterparts, 


while there was virtually no difference between the mean scores of black males and 





females. 

Mean AFQT score improved considerably with the level of educational 
attainment. Non-high school graduates, including dropouts and those still in school, 
had a mean score of 27 whereas individuals with GEDs averaged 46. Individuals 
with a high school diploma or above had a mean score of 57. Consistent with other 
studies, there was a strong positive correlation between mother’s education and 
mean AFQT score. Individuals whose mothers had an eighth grade education or 
less had a mean score of 29, while at the other extreme individuals whose mothers 
were college graduates or above had a mean score of 71. 

Mean AFOT scores also differed by geographic region: New England-60, West 
North Central-58, Middle Atlantic-53, East North Central and Mountain-52, Pacific- 
50, West South Central-48, South Atlantic-44 and East South Central-42.8 Again, 
these findings are consistent with those of other researchers in studies of mental 
ability. 

In summary, the authors found that: (1) whites score higher than blacks and 
Hispanics; (2) AFQT scores improve with age and educational attainment; (3) there 


is a strong positive correlation between mother’s education and AFQT score and 





7The 'white" racial/ethnic group includes individuals who are classified as "other" (i.e. 
not black or Hispanic). 


8The authors did not indicate whether the differences in AFQT scores discussed above | 
were Statistically significant. | 








(4) individuals in the Northeastern regions of the United States score above the 
average on the AFQT while those from the Southeastern regions score below the 
average. 

A study by Behrendt, Eisenach and Johnson (1986) investigates the effects of 
school and family characteristics on state-wide average SAT scores in 1981 and 
1982. The school and family characteristics studied are contained in Table 1 below. 
The dependent variable was the mean combined (math and verbal) SAT score for 


each state. 


TABLE 1. SCHOOL AND FAMILY CHARACTERISTICS STUDIED BY 
BEHRENDT, EISENACH AND JOHNSON (1986) 


SCHOOL 





average salary of teaching staff 

average non-salary school expenses per pupil 

average teachers per pupil 

average number of students per school (to capture scale effects) 
percentage of schools that were private 

whether or not there were state-wide high school graduation requirements 


*+* &£ * © & £& 


FAMILY 


percentage of population that was non-white or non-oriental 

percentage of population living in an urban area 

average number of siblings in families with children 

percentage of children in female-headed households 

percentage of population residing in the state for less than five years (to 
capture mobility effects) 

* percentage of population with four years of college 

* median family income for a family of four 


* © & & & 





Mean SAT score differed significantly by state as did the proportion of the 
students who took the SAT. The authors pointed out that since the brightest 
students were most likely to take the test, states with lower test participation rates 
would have higher mean scores, all else constant. They attempted to correct for 
this selectivity problem so that unbiased coefficient estimates could be produced.? 

The authors concluded that school characteristics had little effect on SAT 
score. The average number of students per school had a statistically significant 
negative effect on SAT performance when only school characteristics were included 
in the model. The percentage of schools that were private had a statistically 
significant positive effect on SAT performance for the model that only included 
school characteristics and for the model that included both school and family 
characteristics. 

The average number of siblings per family and the percentage of female- 
headed households had statistically significant negative effects on SAT performance 
for a model that only included family characteristics and for a model that included 
both school and family characteristics. The percentage of the population that had 
four years of college had a statistically significant positive effect on SAT 
performance in both models. The authors were surprised by the lack of significance 
for the percentage of non-white and non-oriental variable as this is contrary to most 
findings regarding minority performance on tests of mental ability. The authors 
explained that while there was a strong bivariate relationship between this variable 
and SAT score, the variable is apparently only a proxy for other demographic 
conditions such as larger families, fewer college educated parents and more female- 


headed families, all of which generally characterize minority families. 


°A discussion of the statistical procedures used to correct for this selectivity bias are 
beyond the scope of this brief summary. The interested reader can find a detailed 
explanation of the correction procedure on pages 365 through 367 of the paper. 
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The discovery of relationships between variation in mental ability and 
socioeconomic attributes such as those discussed above has provided manpower 
researchers with the insights necessary to model AFQT distributions for the 
purposes of regional QMA estimation. Curtis, Borack and Wax (1987) in a first 
attempt to estimate regional QM “*,, clustered like counties based on socioeconomic 
attributes that were correlated with AFQT score. An AFQT score distribution was 
then computed for each cluster and each county in the cluster was assumed to have 
that distribution of AFQT scores. While parsimonious, the aggregation of counties 
into several large clusters introduces biases and is dependent on only one or two 
explanatory variables. Subsequent methodologies such as those developed by 
Goldberg and Goldberg (1989) and Orvis and Gahart (1989) use maximum 
likelihood regression techniques to estimate the proportion of a population subgroup 
that falls into a given mental category as a function of a vector of explanatory 
variables. This approach allows each county’s AFQT score distribution to vary with 
its socioeconomic composition; however, it is dependent on the availability of county 
level data to support the explanatory variables selected. 

The purpose of Curtis, Borack and Wax’ research was to produce estimates 
of QMA for the years 1984 through 1990 for each Marine Corps recruiting district 
and station and for each U.S. county. While the authors were able to produce 
county estimates, they advised against relying on them because of insufficient sample 
sizes and unavailability of some county level data. A process of elimination was 
used in which unqualified segments of the population were dropped. The 
remainder constituted QMA. 

The authors used data from the PROFILE to calculate AFQT score 
distributions. Direct calculation of AFQT score distributions within racial/ethnic 
categories for each county was not possible as many counties are very small and not 


adequately represented in the PROFILE. Therefore, the authors first identified 
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variables that were highly correlated with AFQT score and for which data were 
available at the county level. For Hispanics and whites the level of education best 
predicted AFCT score. For blacks, the level of education and father’s occupation 
best predicted AFOT score. 

AFOT score distributions for Hispanics were calculated using the percent of 
adult Hispanics in the county with more than 12 years of education as a surrogate 
for level of education. All counties, including those not represented in the 
PROFILE, were grouped into four clusters: (1) less than 47 percent; (2) 47 to 56 
percent; (3) 57 to 67 percent and (4) greater than 67 percent. The AFQOT score 
distribution for each cluster was then calculated from the PROFILE participants in 
that cluster. Each county in the cluster was then assumed to have the same 
distribution of AFQT scores as the cluster itself. 

AFOT score distributions for blacks were calculated using a variable known 
as the Socio-Economic Status Indicator (SESI) as a surrogate for father’s 
occupation.!° The counties were split into equal groups based on SESI. The lower 
group had a SESI of less than 41 while the upper group had a SESI of 41 or 
greater. Next, each SESI group was divided at its median on county educational 
attainment. The surrogate for county educational attainment was percent of adult 
blacks with at least 12 years of education. This produced four clusters of counties 
in which the AFQT score distribution for each cluster was calculated from the 
PROFILE participants in that cluster. Each county in the cluster was then assumed 


to have the same distribution of AFQT scores as the cluster itself. 


10SESI is based on income levels, home ownership statistics and educational, 


occupational and environmental characteristics that prevail in a county. It was constructed 
by Donnelly Marketing Information Services with Simmons Market Research prior to this 


research being conducted. 
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AFOT score distributions for whites were calculated using the county college 
completion percentage for adults as a first surrogate for level of education. 
Counties were initially subdivided into three groups: (1) less than or equal to 21.4 
percent; (2) greater than 21.4 percent but less than or equal to 29.4 percent and (3) 
greater than 29.4 percent. Each of these three groups was then subdivided into two 
groups based on the high school completion rate for adults; this formed a total of 
six groups. The authors then combined two of these six groups because of their 
similarity in AFQT score distributions. This resulted in five clusters. The AFOT 
score distribution for each cluster was then calculated from the PROFILE 
participants in that cluster. Each county in the cluster was then assumed to have 
the same distribution of AFQT scores as the cluster itself. 

Goldberg and Goldberg (1989) go a step beyond QMA estimation by using 
enlistment propensity data to estimate qualified military available and interested 
(QMA&I) population at the county and census region level. Specifically, their 
study is focused on forecasting QMA&!I in the reserve recruiting markets. These 
forecasts are broken down by age (17-21 and 22-29), gender and racial/ethnic 
category (white, black and Hispanic) and are provided for 1988, 1990, 1995 and 
2000.1! The authors assume the following identity for QMA&I: 


QMA&I = MA - Q1 - Q2 - Q3 - RPI 
where, 


MA, or military available, is the number of civilian high school graduates 
or equivalents, 


Q1 is the proportion of MA that is medically qualified, 





"The "white" racial/ethnic group includes individuals classified as "other" (i.e. not black 
or Hispanic). 
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Q2 is the proportion of MA that is morally qualified, 
Q3 is the proportion of MA that is mentally qualified, and 


RPI is a reserve enlistment propensity index.!2 


Data from the PROFILE were used to develop AFQT score forecasting 
models for use at the courity level. This meant that the explanatory variables were 
limited to those for which data was available at the county level. 
| The authors assumed a multinomial logit functional form with four possible 
AFOT category outcomes: (1) 1-3A; (2) 3B; (3) 4A and (4) 4B-5. As an alternative 
for comparison purposes, a linear probability model was used in which each AFQT 
category outcome was independently regressed against the explanatory variables. Six 
separate forecasting models were developed based on gender and racial/ethnic 
category. 

Consistent with other studies, differences in AFQT sccre distributions were 
found among the gender and racial/ethnic groups. Other significant explanatory 
variables included mother’s education, poverty status, age, and East South Central 
census region. Puerto-Rican ethnicity was an important variable in the explanation 
of AFQT score for female Hispanics. Net family income was not significant in 


explaining AFQT score for any subgroup. 


For MA the authors used Woods and Poole forecasts since the, include 
noninstituitionalized, civilian high school graduates (or the equivalent) of relevant age 
segments (17-21 and 22-29 years old) for the years of interest. Medical qualification rates 
were those derived from the National Health and Nutrition Examination Survey, 1976- 
1980 (NHANES II) for individuals 16 to 24 years of age. Moral qualification rates were 
obtained from an Air Force study based on the juvenile delinquency rates in two U.S. 
cities. Reserve enlistment propensity indices were estimated using data from the Youth 
Attitude Tracking Survey (YATS) and were based in part on the work of Orvis (1986). 
For a comprehensive discussion of the procedures used to compute these propensity indices 
see pages 26-40 of Goldberg and Goldberg (1989). 
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In analyzing the within sample error the authors estimated the AFQT score 
distribution of high school graduates at the census region level using the forecasting 
models and regional level means of the explanatory variables.'3 Forecasted AFQT 
score distributions were then compared with the actual AFQT score distributions. 
The absolute percentage errors were lower for whites and declined as the AFQT 
categories were expanded such as from 1-3A to 1-4A. The authors considered the 
absolute percentage errors of between one and ten percent "relatively low." The 
linear probability model used for comparison gave smaller errors than the 
multinomial logit model; however, the authors concluded this was because the 
values of the explanatory variables were close to their means. The multinomial logit 
functional form was considered the preferred model for county level forecasting 
since county-level observations for explanatory variables may be extreme because 
of small samples. 

Orvis and Gahart (1989) developed an AFQT score forecasting method that 
attempts to reduce a specific source of bias in estimating AFQT scores when the 
sample of ASVAB test takers is not representative of the target population. While 
this method can be used with a variety of national youth surveys, it was developed 
primarily with the characteristics of the YATS in mind. The authors merged the 
YATS data (1976-1980) with files from the Military Entrance Processing Station 
Reporting Syst’-m through March 1985. The YATS respondents were from a 
stratified random sample of American youth; however, the characteristics of YATS 
respondents who take the ASVAB differ from those of YATS respondents who do 
not. Therefore, if a regression model was developed using only that portion of the 
sample which took the ASVAB, parameter estimates would very likely be biased 


and not representative of the target population. 


Goldberg and Goldberg (1989) defined a high school graduate as an individual who 
had completed 12 or more years of education. 
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To correct for this selectivity bias, the authors used the Heckman procedure 
which is a simultaneous two equation maximum likelihood (probit) technique 
designed to take into account: (1) the probability that an individual takes the 
ASVAB and (2) the correlation between the error terms of the equation for 
estimating the probability of taking the ASVAB and those of the equation which 
estimates AFQT score. The first equation, which estimates the probability of taking 
the ASVAB, was estimated using the entire YATS sample. The second equation, 
which estimates the probability that an individual will score in the upper 50th 
percentile on the AFQT, was estimated using those individuals who took the 
ASVAB. 

The authors selected the upper 50th percentile since this is the population 
from which recruiting officials attempt to draw enlistees. Other percentile scores 
could be established as the cutoff point. The statistical model adjusts the 
parameters of the second equation to minimize the selectivity bias modeled by the 
first equation. The second equation can then be used with representative samples 
to estimate the probability that an individual will fall above a given cutoff percentile 
on the AFQT. Orvis and Gahart felt that the two equation method produces more 
accurate estimates of AFQT category than a single equation method. 

This methodology for estimating AFQT category is particularly useful in that 
representative samples of individuals taking the ASVAB are not always readily 
available and therefore selectivity bias is a concern. However, as pointed out by 
Goldberg and Goldberg (1989), the models developed by Orvis and Gahart rely 
heavily on data collected through the YATS which are not available at the county 
or census region level. This severely limits the usefulness of this model for 


regional/county forecasting of AFQT score distributions. 


17 








Ill. THEORETICAL MODEL, METHODOLOGY AND DATA 


Developmental and differential psychology research has demonstrated that 
performance on tests of mental ability is associated with gender, race, educational 
opportunities and attainment, family structure, parent’s achievements as well as 
other related factors. Human capital theory also suggests that there is a 
relationship between an individual’s mental ability and his income. In part, human 
capital theory assumes that an individual brings a "portfolio" of personal qualities 
and characteristics to the labor market for which employers are willing to pay. Such 
characteristics may include previous employment experience, special training, 
education, or mental ability as demonstrated on tests at the time of hiring or in 
on-the-job performance. The more these characteristics increase the individual’s 
contribution to the firm’s marginal product, the greater the amount of pay the 
employer is willing to provide the individual. From this theory and the findings of 
the psychological research discussed above, performance on the AFQT is assumed 
to be a function of an individual’s sociodemographic characteristics. 

In its recruiting efforts, the U.S. Armed Forces attempts to draw enlistees 
from the youth population that scores above the 50th percentile on the AFQT (i-e., 
mental categories I, II and IIIA). Recruiting officials generally consider high school 
graduates, ages 17 to 21, who score in the upper 50th percentile to be the "prime" 
or "quality" recruiting market. Therefore, in order to estimate the number of QMA 
youth in a given geographic region, the proportion of 17 to 21 year old high school 
graduates that can score above the 50th percentile on the AFQT must be estimated. 

The data selected for modeling this proportion is the NLSY. The NLSY was 
initiated in 1979 in an attempt to study and better understand the labor force 
behavior of American youth. As discussed in Bock and Moore (1984), the NLSY 
sample consists of three independent probability samples: (1) a cross-section sample 


designed to represent the noninstitutionalized civilian segment of American young 
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people 14 to 21 years old as of January 1, 1979, in their proper population 
proportions; (2) a supplemental oversample of civilian Hispanic, Black, and 
economically disadvantaged non-Hispanic, non-Black (poor white) youth in the same 
age range; and (3) a military sample designed to represent youth aged 17 to 21 as 
of January 1, 1979 who were serving in the military as of September 30, 1978.14 

The NLSY is weighted in order to compensate for the unequal probability of 
selection. New sample weights are computed each year to adjust for the 
characteristics of those respondents who drop out of the survey and the changing 
composition of the overall population the sample represents. The sample weights 
do not appear to change significantly from year to year and are set to zero for 
those years in which a respondent did not participate in the NLSY.' 

During 1980 the ASVAB was administered to NLSY participants in order to 
establish new test norms.'® In order to encourage maximum participation in the 
testing effort, an honorarium of $50 was paid to each NLSY respondent who took 
the ASVAB.!” Of the initial 12,686 NLSY respondents, 11,914 actually took the 


test. This equates to a nearly 94 percent participation rate. 


4Sample sizes for these independent probability samples are: (1) cross-section-5766; 


(2) supplemental oversample-4990; and (3) military sample-1158. These sample sizes 


represent only those survey respondents who took the ASVAB. 


For a more comprehensive discussion of the sample construction and weighting 


scheme see Bock and Moore (1984) or chapter 4 of the NLSY Documentation 1979-1987 
Background Materials, Attachments, Appendices, & Special Survey Documentation 


(reference 10). 


16Until this point, test norms were still based on data from World War II, the last era 
in which extensive psychological testing was conducted. 


The military portion of the sample also took the ASVAB; their original ASVAB test 
scores upon enlistment were not used as their scores for this testing effort. 
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The partitioning of the NILSY data follows directly from the way recruiting 


officials define the "quality" recruiting market. Only high school graduates who were 
17 to 21 years of age in 1980 are retained in the sample. To qualify as a high 
school graduate, a respondent had to report receiving a high school diploma during 
or before the 1981 administration of the NLSY.'® Seventeen to 21 year old survey 
participants who received a high school diploma after 1981 were generally off of a 
normal education track. This observation is supported empirically in the NLSY data 
by their poorer performance on the AFQT as well as by the fact that the ratio of 
GED’s to high school diplomas for such individuals is significantly higher than that 
for respondents who graduated in 1981 or before. 

The sample of 17 to 21 year old high schvol ,raduates is then divided into six 
subgroups which include: (1) white males (WM); (2) white females (WF); (3) black 
males (BM); (4) black females (BF); (5) Hispanic males (HM); and (6) Hispanic 
females (HF).!? Several factors support partitioning the data in this manner. First, 
as evidenced in Appendix B, the mean AFQT scores differ significantly among these 
six subgroups as does the percentage of individuals who scored above the 50th 
percentile. Secondly, this partitioning is consistent with the distinction recruiting 
officials often make in setting recruiting goals for various population subgroups. 


And finally, if dummy variables are used to capture the effects of gender and race, 





18For the purpose of this model, GED recipients are not considered high school 


graduates. Mean AFQT score for GED recipients is significantly lower than that of high 


school diplonia recipients. 


1" White" includes individuals who are classified as "other" (i.e. not black or Hispanic). 


*As an additional notion of significant differences between mean AFQT scores for 


these six groups, Appendix C provides the results of an ordinary least squares model in 
which AFQT score is the dependent variable and dummy variables are included for the 
subgroups (white male is the base case). In addition to all of the coefficients for these 
subgroups being negative and, with the exception of Hispanic males, statistically significant, 


there are relatively large absolute differences in the size of the coefficients. 
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the coefficients and thus the effects of the remaining explanatory variables are 
forced to be the same for each subgroup. As the final model specifications show, 
this is very likely not the case. 

Individuals in the military sample of the NLSY are retained in the samples for 
these six subgroups provided they meet the criteria for age and high school 
graduation status. A priori it was expected that retaining those in the military 
would bias the probability of being above the 50th percentile on the AFOT upwards 
since presumably these individuals were selected for military service in part because 
they scored above the 50th percentile. In fact, prior to partitioning the NLSY 
sample based on age and high school graduation status, a significant portion of the 
military respondents scored below the 50th percentile on the AFQT. Once age and 
high school graduation status are controlled for through data partitioning, the 
numbers of military individuals remaining in the subgroup samples are extremely 
small.?4 

As discussed earlier, previous research has clearly demonstrated a strong 
positive correlation between mother’s education and performance on the AFQT. 
The models developed in this thesis, however, use average parents’ education 
instead of mother’s education.” Several factors support the use of this variable 
rather than mother’s education. First, it performs as well as mother’s education at 
explaining performance on the AFQT. Secondly, it helps reduce the relatively large 
number of missing values for mother’s education. And finally, county-level data to 


support this variable is more readily available than it is for mother’s education. 


21After dropping observations that fail to meet age and high school graduation status 


criteria and observations with missing values, the following numbers of military observations 
remain by subgroup in the final model specifications presented in Chapter 4: WM-1; WF- 


1; BM-2; BF-0; HM-0; and HF-0. 





22For those observations in which one of the parent’s education is missing, the other 
parent’s education is used as the average parents’ education. 
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Poverty and welfare variables are included as proxies for general 
socioeconomic status and are expected to have a negative effect on AFQT 
performance. Similarly, net family income is also included as an indicator of 
socioeconomic status and is expected to have a positive effect on AFQT 
performance.” 

A dummy variable for living in an urban area is included to capture such 
effects as the quality of schools and educational opportunities the individual may 
have received. In general, a priori expectations are that living in an urban area will 
have a negative effect on AFQT performance for minorities since such individuals 
often attend inner-city schools with marginal education opportunities and standards. 
A better explanatory variable with which to capture such effects and perhaps 
interact with the urban variable is the type of school, public or private, that the 
individual attended. Surprisingly, the NLSY does not contain such a variable. 

To capture the effects of a stable family life on AFQT performance, a dummy 
variable for whether or not an individua] came from a dual-parent (or stepparent) 
family is investigated. A priori expectations are that individuals who come from a 
dual-parent home perform better on the AFQT than those who do not. 

Interaction effects among these explanatory variables are also explored. 
Additionally, various census regions were also considered for inclusion in the 
models. While significant relationships exist between various census regions and 


performance on the AFQT, one must make the assumption that AFQT 


The variable for net family income in 1980 has a relatively large number of missing 


values (approximately 23 percent). To preserve the sample sizes, net family income in 
1979 or 1981 was substituted when the 1980 net family income was missing. These income 
values were then deflated to 1967 dollars using the Consumer Price Index. T-tests indicate 
that the difference in real income between 1979 and 1980 was statistically significant in the 
white male, black male and Hispanic male samples. 


between 1980 and 1981 was statistically significant for black males only. 


22 





The difference in real income 


performance in each county is reflective of that for the entire census region in order 
to include this variable in the model. Such an assumption seems very unlikely and 
therefore census regions are not included as explanatory variables. 

In summary, the explanatory variables chosen for investigation include: (1) 
parents’ education; (2) poverty status; (3) whether or not the individual’s family 
received government subsistence payments (welfare); (4) net family income; (5) 
wheilier or not the individual lived in an urban area; and (6) whether or not the 
individual came from a home with two parents and/or stepparents.” 

In estimating the probability that an individual will score above the 50th 
percentile on the AFQT, it is assumed that the random errors affecting AFQT 
performance are logistically distributed. Therefore, the natural log of the odds ratio 


of scoring above the 50th percentile can be specified as: 


In [P/(1-P)] = a + bX + u; 





where 
P = the probability that an individual will score above the 50th 
percentile on the AFQT 
X = the vector of sociodemographic explanatory variables discussed 


above, and 


u = a random error term that is logistically distributed. 


The logit functional form has several attractive properties. Unlike the linear 
probability model, the logit functional form constrains the probability to the range 
zero to one and does not assume that the effects of the explanatory variables on 


the probability of being in the prime market are constant. In those instances where 


*Appendix D contains a table of the NLSY variable numbers and names that were 
used to construct the variables for the models developed. 
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a county-level average for a particular explanatory variable varies significantly from 
the mean of the explanatory variable used to estimate the model, the constant 
effects assumption would give less accurate estimates. The logit functional form 


allows the model to capture nonlinearities in the probability relationships. 
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IV. RESULTS 


Comparisons of the means of the variables among the racial subgroups 
confirm a priori expectations and are consistent with previous research findings. 
The percentage of whites scoring above the 50th percentile is the highest followed 
by Hispanics and then blacks. Average parents’ education for whites exceeds the 
12 year level fcllowed by blacks and Eispanics with approximately 11 and 9 years 
respectively. Whites are less likely to be in poverty or receiving welfare than are 
blacks or Hispanics--blacks are slightly more likely to be in poverty or receiving 
welfare than are Hispanics. Mean family income levels are also consistent with the 
findings for poverty status and welfare. Whites are less likely to live in urban areas 
than are blacks and Hispanics while Hispanics are more likely than blacks to live 
in an urban area. The descriptive statistics to support these observations are 
contained in Appendix B. Descriptive statistics for whether or not an individual 
lived in a dual-parent family are not provided in Appendix B as this variable was 
not included in the final model specifications; however, as expected, whites were 
more likely than blacks and Hispanics to come from dual-parent homes. 

The logit regression equations presented in Table 2 below were calculated 
using the LOGIST procedure contained in release 5.16 of SAS at the United States 
Naval Postgraduate School. In addition to using the weighting option in the 
LOGIST procedure, the NORMWT option was used to normalize the sampling 
weights so that the sum of the weights equaled the actual sample size. Without the 
NORMWT option the standard errors for the coefficients are very small with very 
large chi-squared statistics which do not reflect the actual sample size. 

The primary criterion used for selecting the "best" model was goodness of fit 


as measured by the percentage of individuals properly categorized as above or 
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below the 50th percentile and the model R statistic. Other criteria including 


parsimony and theoretically consistent signs for variable coefficients were also 


considered. 


TABLE 2. AFQT MODEL RESULTS 


WHITE MALES 
Variable Coefficient Std Error 
Intercept -2.322 397 
Avg Parents’ Ed. 258 .032 
Welfare -.133 .434 
Poverty -.111 242 
Income -.0000039 .000011 
Urban .186 .162 


Model Chi-Square=87.31 
R=.238 
Percent Correctly Predicted=71.7 


P-value 


<.0001 
<.0001 
<.7601 
<.6455 
<.7171 
<.2527 


The model R statistic measures the predictive ability of the model. For a complete 
discussion of this statistic see page 271 of the SUGI Supplemental Library User’s Guide 


(1986). 
26 











WHITE FEMALES 














Variable Coefficient Std Error Chi? P-value 
Intercept -3.976 389 104.49 <.0001 
Avg Parents’ Ed. 404 .034 144.00 <.0001 
Welfare -.699 370 3.56 <.0592 
Poverty -.435 .205 4.51 <.0338 
Income .0000045 .00001 19 <.6602 
Urban -.391 .146 7.13 <.0076 
Model Chi-Square=228.25 
R=.348 
Percent Correctly Predicted=69.1 

BLACK MALES 
Variable Coefficient Std Error Chi? P-value 
Intercept -5.071 826 37.68 <.0001 
Avg Parents’ Ed 291 .061 22.56  <.0001 
Welfare -.039 .414 01 <.9252 
Poverty -.835 .452 3.40 <.0650 
Income .000051 .000024 4.45 <.0350 
Urban .180 .437 17 <.6813 


Model Chi-Square=50.44 
R=.316 
Percent Correctly Predicted=79.2 
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BLACK FEMALES 


Variable Coefficient Std Error 
Intercept -4.056 631 
Avg Parents’ Ed .287 054 
Welfare -.802 421 
Poverty -.273 .289 
Income .0000064 .000025 
Urban -.750 .289 


Model Chi-Square=58.38 
R=.308 
Percent Correctly Predicted=82.8 


HISPANIC MALES 


Variable Coefficient Std Error 
Intercept -2.706 614 
Avg Parents’ Ed. .195 .043 
Welfare -.524 596 
Poverty -.088 .492 
Income .000048 .000032 
Urban 511 538 


Model Chi-Square=48.61 
R=.370 
Percent Correctly Predicted=66.2 
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P-value 


<.0001 
<.0001 
<.0571 
<.3451 
<.7977 
<.0095 


P-value 


<.0001 
<.0001 
<.3797 
<.8580 
<.1377 
<.3421 





HISPANIC FEMALES 


Variable Coefficient Std Error Chi? P-value 
Intercept -3.128 984 10.10 <.0015 
Avg Parents’ Ed. .198 050 15.34 <.0001 
Welfare -2.162 1.212 3.18 <.0745 
Poverty -.801 .467 2.94  <.0866 
Income -.000014 000032 19 <.6664 
Urban 552 894 38  <.5374 
Model Chi-Square=34.87 

R=.315 


Percent Correctly Predicted=78.9 


An implicit assumption in the specification of each of these models which is 
neither theoretically consistent nor supported by the findings of previous research 
is that AFQT performance does not vary with age within the 17 to 21 year old 
range. Controlling for age would require county-level data on the age distribution 
within the 17 to 21 year old population. Since such data, not unlike that for other 
potentially important variables, generally are not available at the county level, this 
assumption must be made for the purposes of this model development effort. The 
omission of the age variable as well as other potentially important explanatory 
variables will undoubtedly introduce some degree of specification bias. 

The bivariate relationships presented in Table 3 show no clear upward trends 
in the percentages of individuals scoring above the 50th percentile with increases 
in age; however, when a continuous variable for age is introduced into the final 
model specifications, the multivariate relationships appear more telling. For each 
model the coefficient for age is positive as expected and in the models for white 
males, white females and black females the coefficients are significant at the .02, .01 


and .06 levels respectively. Additionally, the inclusion of the continuous age variable 
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improves the percentage correctly predicted in the models by as much as two 


percent. 


TABLE 3. PERCENTAGE OF RESPONDENTS SCORING ABOVE THE 
50TH PERCENTILE ON THE AFQT BY AGE 


Age WM WF BM BF HM HF 
17 714 659 244 149 572 291 
18 699 679 183 175 323 179 
19 690 683 185 190 401 218 
20 720 679 236 138 449 221 


21 .781 659 .250 .263 .605 301 


Consistent with findings throughout the psychological testing literature and 
previous efforts to model AFQT distributions, parents’ education has a strong 
positive effect on the probability of an individual scoring above the 50th percentile 
on the AFQT. 

The coefficient signs for welfare and poverty are negative in each subgroup 
model confirming a priori expectations; however, welfare is only significant at 
conventional levels in the three female models while poverty is only significant in 
the white female, black male and Hispanic female models. With one exception, the 
effects of income were generally consistent with the finding of Goldberg and 
Goldberg (1989) that income has no significant effect on AFQT performance. In 
the black male model, income was significant at the .05 level. 

A priori expectations for the urban variable were confirmed only in part. As 


a proxy for school quality and educational opportunity, as well as for other hard to 
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measure quality of life attributes, urban was expected to have a negative effect on 
the probability of a minority individual scoring in the upper 50th percentile on the 
AFQT. The results for this variable are interesting but not particularly easy to 
explain. For white and black males, urban has a positive but not significant effect 
at conventional levels on the probability of scoring in the upper 50th percentile. 
For white and black females on the other hand, urban has a significant (.01 level) 
negative effect on the probability of scoring in the upper 50th percentile. In both 
Hispanic models, urban has a positive but not significant effect. 

The dummy variable for coming from a dual-parent family was not included 
in the final model specifications. While the variable has a positive and significant 
bivariate relationship with AFQT performance, no significant multivariate 
relationships could be obtained in alternative model specifications. Additionally, the 
variable did not increase the goodness of fit for any of the final model 
specifications. 

Interaction effects among the explanatory variables were also investigated but 
none are included in the final model specifications. Again, significant bivariate 
relationships can be obtained between many of the interaction variables and AFOT 
score; however, significant multivariate relationships cannot be established nor do 
any of these variables improve the goodness of fit of the models. 

The question of independent effects of the explanatory variables and the 
problem of multicollinearity were less of a concern since the primary criterion for 
model selection is predictive quality. A priori expectations were that the degree of 
multicollinearity among the variables would be "high" to "severe." Human capital 
theory suggests a strong relationship between parents’ education and net family 
income. Since welfare is a function of poverty which is in turn a function of 
income, the collinearity among these variables was considered strong a priori. 


Surprisingly, these expectations were not confirmed. Using a linear probability 
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model with the same explanatory variables that are in the final logit models, 
condition indices and variance inflation factors of approximately 12 and 4 
respectively were obtained from the SAS collinearity diagnostics.“ Since 
multicollinearity is considered to be a sample phenomenon, the lack of strong 
collinearity among these variables is most likely attributable to the peculiar 
characteristics of the samples used in this study. 

As a comparison of the models presented in Table 2 with those of Goldberg 
and Goldberg (1989), the estimated proportions of individuals who would score 
above the 50th percentile on the AFQT were computed at the means of the 
explanatory variables as contained in Appendix B.2’ A priori it was expected that 
the models developed in this thesis would estimate a higher proportion of 
individuals scoring above the 50th percentile since only high school graduates with 
diplomas were included in the subgroup samples. Confirmation of this expectation 
varied by subgroup. 

The models developed in this thesis estimated higher proportions for white 
males, black males and black females while the Goldberg models estimated higher 
proportions for the remaining three models. While the model specifications do 
differ, the results in Table 4 indicate in part that the equations required to model 
the AFQT performance of high school graduates with diplomas differ from those 
required to model that of high school graduates including individuals with 


equivalencies. 


*See page 409 of Neter, Wasserman and Kutner (1989) for a discussion of variance 
inflation factors and page 301 of Gujarati (1988) for a discussion of condition indices. 


27Variable means for East South Central census region and Puerto-Rican which are 
included in the Goldberg models were calculated for the appropriate subgroup samples in 
Appendix B. 
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TABLE 4. COMPARISON OF THE ESTIMATED PROPORTIONS OF 
INDIVIDUALS SCORING ABOVE THE 50TH PERCENTILE ON 





THE AFQT 

Population Goldberg Model in 

Subgroup Model Table 2 
WM Bt i .738 
WF -706 .672 
BM 141 175 
BF .118 141 
HM 472 .460 
HF 232 177 
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Vv. CONCLUSIONS 


The models in Table 2 support the suggestions of previous research that 
parents’ education, poverty status and urban residence are good predictors of 
mental ability. Several additional conclusions arising from this work are also worthy 
of mention. First are the relative effects of the individual explanatory variables. 
The estimation and analysis of numerous alternative model specifications confirms 
that average parents’ education is the best sociodemographic predictor of AFQT 
performance of the variables studied. While the other included variables have 
significant bivariate relationships with AFQT performance, their multivariate 
relationships are much less pronounced and consistent across gender/racial groups. 
Additionally, their ability to improve the goodness-of-fit for the models is small. 

Secondly, the constraints on explanatory variables caused by NLSY data 
limitations and availability of county-level data provide little opportunity to expand 
the set of explanatory variables. School characteristics and investments per capita 
in education would seem to have the most promise in terms of expanding this set 
of variables; however, even if county-level data were available to support such 
variables, the NLSY surprisingly has no variable counterparts for such data. And 
of course, it is quite unlikely that the Department of Defense will sponsor another 
ASVAB testing effort such as the Profile of American Youth any time in the near 
future. While her data sources such as the YATS could be used, these sources 
are not without their ;roblems. In particular, sample sizes and selectivity bias 
render a data source such as YATS undesirable for the task of estimating AFQT 
distributions. 

Perhaps the best prospects for improved estimation of regional mental 
category distributions lie in the national effort to reform our ailing school systems. 


If nationwide education standards and testing systems are established, the prospects 
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for obtaining accurate and precise estimates of mental category distributions at an 
observation level as low as the individual high school are encouraging. 

Therefore, the model results of this thesis are likely to provide the best 
estimating equations for use in regional QMA analysis. These equations represent 
substantial improvement over previous efforts that included high school graduate 
equivalents with high schonl graduates who earned diplomas. These results 


represent the current state-of-the-art model for estimating the high quality recruiting 


market. 





APPENDIX A 


CONVERSION OF RAW ASVAB DATA TO AFQT PERCENTILE SCORES 


AFOQT percentile scores are computed using standardized scores for the 
Verbal (includes Word Knowledge and Paragraph Comprehension), Arithmetic 
Reasoning and Math Knowledge ASVAB subtests. The conversion of raw subtest 
scores to standard scores is accomplished through a linear transformation using a 
mean of 50 and a standard deviation of 10. Transformations are based on the data 
collected through the Profile of American Youth using the weighted population of 
18 to 23 year old males and females. The formula to transform a raw subtest score 


into a standard subtest score (SSS) is as follows: 


SSS = (10/S) (NC-X) + 50, 


where 
SSS = the standardized subtest score (round this result to the nearest 
integer: if it is less than 20 then raise it to 20 and if it is greater 
than 80 then lower it to 80) 
S= the standard deviation of the subtest raw scores (see Table 1 for 
this value for each subtest) 
NC = _ the number of questions answered coiiectly for the given subtest 


(for Verbal this is the sum of the number answered correctly for 
Word Knowledge and Paragraph Comprehension) 


x= the mean of the subtest raw scores (see Table 1 for this value for 
each subtest) 


Table A-1 provides the standard deviation (S) and mean raw score (X) for 


each of the subtests. 














TABLE A-1. ASVAB SUBTEST MEANS AND STANDARD DEVIATIONS 


ASVAB Abbreviation Mean Standard Deviation 
Verbal VE 37.281 10.595 
Arithmetic Reasoning AR 18.009 7.373 
Math Knowledge MK 13.578 6.393 


The total standardized sum of scores (TSSS) is then computed as follows: 
TSSS = 2(SSS for VE) + (SSS for AR) + (SSS for MK). The transformation from 
TSSS’s to AFQT percentile scores is nonlinear. Table A-2 contains TSSS’s, or 
ranges thereof, and their associated AFQT percentile score. Table A-3 recaps the 
transformation of TSSS’s to AFQT percentile scores and shows the associated 


mental group categories. 


TABLE A-2. CONVERSION OF TOTAL STANDARDIZED SUM OF SCORES 
TO AFQT PERCENTILE SCORES 


TSSS AFQT PCT SCORE TSSS AFQT PCT SCORE 
95-120 1 155-156; 15 
121-124 2 157-158 16 
125-127 3 159-160 17 
128-131 4 161-162 18 
132-134 3 163-164 19 
135-137 6 165 20 
138-139 7 166-167 21 
140-142 8 168-169 22 
143-144 9 170-171 23 
145-146 10 172 24 
147-148 11 173-174 25 
149-150 12 175 26 
151-153 13 176-177 27 
154 14 178 28 
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TSSS AFQT PCT SCORE TSSS AFOT PCT SCORE 


179-180 29 218 64 
181 30 219 65 
182 31 220 66 
183-184 32 221 67 
185 33 222 68 
186 34 223 69 
187-188 35 224 70 
189 36 225 71 
190 37 226 72 
191 38 227 73 
192 39 228 74 
193 40 229 75 
194 41 230 76 
195-196 42 231 V1 
197 43 232 78 
198 44 233 79 
199 45 234-235 80 
200 46 236 81 
201 47 237 82 
202 48 238-239 84 
203 49 240 85 
204 50 241 86 
205 31 242 87 
206 a2 243 88 
206 32 244 89 
207-208 33 245 90 
209 34 246 91 
210 35 247 92 
211 56 248 93 
212 37 249 94 
213 58 250 95 
214 59 251 96 
215 61 252 97 
216 62 253 98 
217 63 254-258 99 


Note: the 60th and 83rd percentiles are omitted 
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TABLE A-3. RECAPITULATION OF TOTAL STANDARDIZED SUM OF 
SCORES, AFQT PERCENTILE SCORES AND MENTAL GROUP 
CATEGORIZATION 


TSSS RANGE AFQT PCT SCORE RANGE MENTAL GROUP 


258-248 99-93 I 
247-219 92-65 II 
218-204 64-50 IIIA 
203-182 49-31 IIB 
181-166 30-21 IVA 
165-157 20-16 IVB 
156-145 15-10 IVC 
144-95 9-1 Vv 
Note: (1) the range of the mental groups is not 


symmetrical about the 50th percentile 
(2) AFOQT scores are uniformly distributed. 


This appendix was developed from information in Maier and Sims (1986) and 


from information received from Milton H. Maier at the Defense Manpower Data 
Center (East) (Autovon 226-0552). 
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APPENDIX B 


TABLE B-1. WEIGHTED MEANS AND STANDARD DEVIATIONS 


Variable WM WF BM BF HM HF : 

NOBS 1156 1384 389 530 204 227 

% CAT I- 22 .646 .214 .187 462 241 

THA (.448)  (.478) (.410) (.390) (.499) (.428) 

AFQT 64.540 59.542 31.337 30.381 47.007 34.828 

SCORE (25.533) (24.387) (24.018) (21.773) (26.614) (21.631) 

AVERAGE 12.649 =: 12.397 10.987 10.942 9.048 8.371 

PARENTS’ (2.528) (2.414) (2.603) (2.797) (4.644) (4.027) 

EDUCATION 

WELFARE .022 .028 .173 .197 .104 .135 
(.146) — (.165) (.378) (.399) (.305) (.342) 

POVERTY .086 .098 .205 .326 152 .250 - 
(.281) — (.297) (.404) (.469) (.359) (.433) 

INCOME 10859 9901 6788 6149 8239 6834 . 
(6862) (6635) (5286) (4806) (6575) (5172) 

URBAN .789 -760 .863 802 .888 .946 
(.408) (.427) (.344) (.399) (.316) (.226) 

Note: (1) Standard deviations are given in the parentheses and are computed 


using the sum of the weights. 
(2) See footnote 23 for a discussion of the income variable. 











APPENDIX C 


(1) White males (WM) is the base case. 
(2) Coefficients and standard errors are weighted. 





OLS MODEL USING DUMMY VARIABLES FOR THE 
GENDER/RACIAL SUBGROUPS 








Coefficient Std Error T-statistic P-value 
18.267 1.948 9.376 .0001 
-3.901 792 -4.927 .0001 

-25.598 1.800 -14.220 .0001 
-25.977 1.603 -16.203 .0001 
-3.754 2.789 -1.346 1784 
-12.978 2.112 -4.785 .0001 
3.547 148 23.991 .0001 
-5.784 1.820 -3.177 .0015 
-1.430 1.174 -1.218 2232 
.000161 .0000583 2.767 .0057 
-.126 .903 -.140 .8886 
F-value= 160.211 R?=.2923 N=3890 











APPENDIX D 





TABLE D-1. YOUTH NATIONAL LONGITUDINAL SURVEY (NLSY) 
VARIABLES USED IN DATA ANALYSIS 

Variable 

Number Variable Description and Survey Year 

R18 Urban or Rural Residence at Age 14 (1979) 

R19 With Whom did Respondent Live at Age 14 (1979) 

R65 Highest Grade Attended by Mother (1979) 

R79 Highest Grade Attended by Father (1979) 

R96 Racial/Ethnic Origin (1979) 

R182 Does Respondent Have High School Diploma or 
Equivalent (1979) 

R183 Which does Respondent Have, High School Diploma 
or GED (1979) 

R1137 Is Respondent in Military Sample or Currently on 
Active Duty (1979) 

R1915 Did Family Receive Welfare or Government 
Subsistence (1979) 

R2148 Sex of Respondent (1979) 

R2179 Total Net Family Income (1979) 

R2202 Age of Respondent (1980) 

R2299 Does Respondent Have High School Diploma or 
Equivalent (1980) 

R2300 Which does Respondent Have, High School Diploma 
or GED (1980) 

R4052 Sampling Weight (1980) 

R4060.10 Total Net Family Income (1980) 

R4181 Does Respondent Have High School Diploma or 
Equivalent (1981) 

R4182 Which does Respondent Have, High School Diploma 
or GED (1981) 

R6151 ASVAB Subtest Raw Score; Arithmetic Reasoning 
(1980) 

R6152 ASVAB Subtest Raw Score; Word Knowledge (1980) 

R6153 ASVAB Subtest Raw Score; Paragraph Comprehension 
(1980) 

R6157 ASVAB Subtest Raw Score; Mathematics Knowledge 
(1980) 

R6184.10 Total Net Family Income (1981) 

R6185 Family Poverty Status in 1980 (1981) 
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